Slovak Web Discussion Corpus

نویسندگان

  • Daniel Hládek
  • Ján Stas
  • Jozef Juhár
چکیده

The proposed form and annotations should enable further classical and computational linguistic research of a contemporary way of communication web discussions. Its size should be sufficient for statistical analysis of word connotations, language modeling or document classification, clustering or information retrieval tasks. Future effort will be focused on processing data from social networks. Abstract

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Slovak National Corpus tools and resources

The article presents current state of affairs in several projects conducted by the Slovak National Corpus department of the L’. Štúr Institute of Linguistics, Slovak Academy of Sciences. We describe the Slovak National Corpus, Corpus of Spoken Slovak, tools used for linguistics analysis and an ongoing effort to create Slovak WordNet. 1 Slovak National Corpus The Slovak National Corpus is a huge...

متن کامل

5 th Workshop on Intelligent and Knowledge oriented Technologies

The article presents current state of affairs in several projects conducted by the Slovak National Corpus department of the L’. Štúr Institute of Linguistics, Slovak Academy of Sciences. We describe the Slovak National Corpus, Corpus of Spoken Slovak, tools used for linguistics analysis and an ongoing effort to create Slovak WordNet. 1 Slovak National Corpus The Slovak National Corpus is a huge...

متن کامل

Are Web Corpora Inferior? The Case of Czech and Slovak

Our paper describes an experiment aimed to assessment of lexical coverage in web corpora in comparison with the traditional ones for two closely related Slavic languages from the lexicographers’ perspective. The preliminary results show that web corpora should not be considered ―inferior‖, but rather ―different‖.

متن کامل

TUKE-BNews-SK: Slovak Broadcast News Corpus Construction and Evaluation

This article presents an overview of the existing acoustical corpuses suitable for broadcast news automatic transcription task in the Slovak language. The TUKE-BNews-SK database created in our department was built to support the application development for automatic broadcast news processing and spontaneous speech recognition of the Slovak language. The audio corpus is composed of 479 Slovak TV...

متن کامل

Opinion Mining in Conversational Content within Web Discussions and Commentaries

The paper focuses on the problem of opinion classification related to web discussions and commentaries. It introduces various approaches known in this field. It also describes novelty methods, which have been designed for short conversational content processing with emphasis on dynamic analysis. This dynamic analysis is focused mainly on processing of negations and intensifiers within the opini...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014